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1. Introduction 

It is common to model the joint probability distribution of a family of n 
random variables {Xi, . . . , Xn} in two stages: First to specify the condi- 
tional dependence structure of the distribution, then to specify details of the 
conditional distributions of the variables within that structure (see p. 1274 
of (author?) 8, or p. 180 of (author?) 3, for example). The structure may 
be summarized in a variety of ways in the form of a graph Q = (V, £) whose 
vertices V = {1, n} index the variables {Xi} and whose edges C V x V 
in some way encode conditional dependence. We follow the Hammersley- 
Clifford approach (2; 14), in which {i,j) G if and only if the conditional 
distribution of Xi given all other variables {X^ : k ^ i} depends on Xj, i.e., 
differs from the conditional distribution of X^ given {X^ : k ^ i,j}. In this 
case the distribution is said to be Markov with respect to the graph. One 
can show that this graph is symmetric or undirected, i.e., all the elements 
of £ are unordered pairs. 

Our primary goal is the construction of informative prior distributions on 
undirected graphs, motivated by the problem of Bayesian inference of the 
dependence structure of families of observed random variables. As a side 
benefit our approach also yields estimates of the conditional distributions 
given the graph. The model space of undirected graphs grows quickly with 
the dimension of the vector (there are 2"^"'"^)/^ undirected graphs on n 
vertices) and is difficult to parametrize. We propose a novel parametrization 
and a simple, flexible family of prior distributions on Q and on Markov 
probability distributions with respect to Q (8); this parametrization is based 
on computing the intersection pattern of a system of convex sets in W^. 
The novelty and main contribution of this paper is structural inference for 
graphical models, specifically, the proposed representation of graph spaces 
allows for flexible prior distributions and new Markov chain Monte Carlo 
(MCMC) algorithms. 

The simultaneous inference of a decomposable graph and marginal distri- 
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butions in a fully Bayesian framework was approached in (13) using local 
proposals to sample graph space. A promising extension of this approach 
called Shotgun Stochastic Search (SSS) takes advantage of parallel comput- 
ing to select from a batch of local moves (19). A stochastic search method 
that incorporates both local moves and more aggressive global moves in 
graph space has been developed by (author?) (31). These stochastic search 
methods are intended to identify regions with high posterior probability, but 
their convergence properties are still not well understood. Bayesian models 
for non-decomposable graphs have been proposed by (author?) (30) and 
by (author?) (34). These two approaches focus on Monte Carlo sampling 
of the posterior distribution from specified hyper Markov prior laws. Their 
emphasis is on the computational problem of Monte Carlo simulation, not 
on that of constructing interesting informative priors on graphs. We think 
there is need for methodology that offers both efficient exploration of the 
model space and a simple and flexible family of distributions on graphs that 
can reflect meaningful prior information. 

Erdos-Renyi random graphs (those in which each of the (2) possible undi- 
rected edges is included in £ independently with some specified proba- 
bility p £ [0, 1]), and variations where the edge inclusion probabilities pij are 
allowed to be edge-specific, have been used to place informative priors on 
decomposable graphs (16; 21). The number of parameters in this prior spec- 
ification can be enormous if the inclusion probabilities are allowed to vary, 
and some interesting features of graphs (such as decomposability) cannot 
be expressed solely through edge probabilities, (author?) (22) developed 
methods for placing informative distributions on directed graphs by using 
concordance functions (functions that increase as the graph agrees more 
with a specified feature) as potentials in a Markov model. This approach is 
tractable, but it is still not clear how to encode certain common assumptions 
{e.g., decomposability) within such a framework. 

For the special case of jointly Gaussian variables {Xj}, or those with 
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arbitrary marginal distributions Fj{-) whose dependence is adequately rep- 
resented in Gaussian copula form Xj = F^^ [^{Zj)) for jointly Gaussian 
{Zj} with zero mean and unit-diagonal covariance matrix Cij, the problem 
of studying conditional independence reduces to a search for zeros in the pre- 
cision matrix C~^. This approach (see 17, for example) is faster and easier to 
implement than ours in cases where both are applicable, but is far more lim- 
ited in the range of dependencies it allows. For example, a three-dimensional 
model in which each pair of variables is conditionally independent given the 
third cannot be distinguished from a model with complete joint dependence 
of the three variables (we return to this example in Section 5.3). 

In this article we establish a novel approach to parametrize spaces of 
graphs. For any integers n, d £ N, we show in Section 2 how to use the 
geometrical configuration of a set {vi} of n points in Euclidean space to 
determine a graph Q = {V,£) on V = {vi, ...,Vn}- Any prior distribution on 
point sets {vi} induces a prior distribution on graphs, and sampling from 
the posterior distribution of graphs is reduced to sampling from spatial con- 
figurations of point sets — a standard problem in spatial modeling. Relations 
between graphs and finite sets of points have arisen earlier in the fields of 
computational topology (9) and random geometric graphs (25). From the 
former we borrow the idea of nerves, i.e., simplicial complexes computed 
from intersection patterns of convex subsets of M*^; the 1-skeletons (collec- 
tion of 1-dimensional simplices) of nerves are geometric graphs. 

From the random geometric graph approach we gain understanding about 
the induced distribution on graph features when making certain features of 
a geometric graph (or hypergraph) stochastic. 

1.1. Graphical models 

The graphical models framework is concerned with the representation of 
conditional dependencies for a multivariate distribution in the form of a 
graph or hypergraph. We first review relevant graph theoretical concepts 
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and then relate these concepts to factorizing distributions. 

A graph Q is an ordered pair (V,<S) of a set V of vertices and a set 
C V X V of edges. If all edges are unordered (resp., ordered), the graph is 
said to be undirected (resp., directed). All graphs considered in this paper 
are undirected, unless stated otherwise. A hypergraph, denoted 7i, consists 
of a vertex set V and a collection /C of unordered subsets of V (known as 
hyperedges); a graph is the special case where all the subsets are vertex pairs. 
A graph is complete if i5 = V x V contains all possible edges; otherwise it is 
incomplete. A complete subgraph that is maximal with respect to inclusion is 
a clique. Denote by ^{G) and cS(^), respectively, the collection of complete 
sets and cliques of ^. A path between two vertices {vi,Vj} G V is a sequence of 
edges connecting Vi to vj. A graph such that any pair of vertices can be joined 
by a unique path is a tree. A weak decomposition of an incomplete graph 
Q = (V, £) is a partition of V into disjoint nonempty sets (A, B, S) such 
that 5 is complete in Q and separates A and B, i.e., any path from a vertex 
in ^ to a vertex in B must pass through S. Iterative weak decomposition 
of a graph Q such that at each step the separator Si is minimal and the 
subsets Ai and Bi are nonempty generates the prime components of G, the 
collection of subgraphs that cannot be further weakly decomposed. If all 
prime components of a graph Q are complete, then Q is said to be weakly 
decomposable. Any graph Q can be represented as a tree T whose vertices are 
its prime components ^{G); this is called its junction tree representation. 
A junction tree is a hypergraph. 

Let V he a probability distribution on M" and X = {Xi, . . . , X„) a ran- 
dom vector with distribution V. Graphical modeling is the representation 
of the Markov or conditional dependence structure among the components 
{Xi} in the form of a graph G = (V,<S). Denote by f{x) the joint density 
function of {Xi} (or probability mass function for discrete distributions — 
more generally, density for an arbitrary reference measure) . The distribution 
V (and hence its density f{x)) may depend implicitly on a vector 9 of pa- 
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rameters, taking values in some set Qg, which in some cases will depend on 
the graph Q; write G = UQg for the disjoint union of the parameter spaces 
for all graphs on V. 

Each vertex Vi £ V is associated with a variable Xi, and the edges £ 
determine how the distribution factors. The density /(x) for the distribution 
can be factored in a variety of ways associated with the graph Q (20, p. 35). 
It may be factored in terms of complete sets a £ ^{Q): 

f{x)= n MXa\0a), (1.1a) 

ae<r(g) 

or similarly in terms of cliques a G cS; if ^ is decomposable then f{x) may 
also be factored in junction-tree form as: 

;(^) = Uaeng)M^-\^-\ (lib) 
Ub^yiG) I Ob) ' 

where ^{G) and ^(^) denote the prime factors and separators of G, re- 
spectively, and where ipa{xa \ ^a) denotes the marginal joint density for the 
components Xa for prime factors a £ 3^{G) and tl^bi^b I &b) that for separa- 
tors b £ y{G) (8, Eqn. (6)). In the Gaussian similar factorization 
to (1.1b) holds even for non-decomposable graphs (30, Prop. 2), but ilJa{xa) 
and '4'b{xb) lose their simple interpretation. 

The prior distributions required for Bayesian inference about models of 
the form (1.1) may be specified by giving a marginal distribution on the set 
of all graphs G £'^n onn vertices and conditional distributions on each Sg, 
the space of parameters for that graph: 

p{G,e)=p{G) p{9\G), G£'Sn,e£Qg (1.2) 

where 9 £ Qg determines the parameters {6a ■ a £ '^{G)} or {6a : a £ ^{G)} 
and {6b : b £ y(G)}- (author?) (12) pursue this approach in the Gaussian 
case, while (author?) (8) offer a rigorous framework for specifying more gen- 
eral prior distributions on Qg. Such priors, called hyper Markov laws, inherit 
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the conditional independence structure from the samphng distribution, now 
at the parameter level. The hyper Inverse Wishart, useful when the factors 
are multivariate normal, is by far the most studied hyper Markov law. Most 
previously studied models of the form (1.2) specify very little structure on 
p{G) (12; 16; 30) — typically p{G) is taken to be a uniform distribution on the 
space of decomposable (or unrestricted) graphs, or perhaps an Erdos-Renyi 
prior to encourage sparsity (21), with no additional structure or constraints 
and hence no opportunity to express prior knowledge or belief. 

Two inference problems arise for the model specified in (1.2): inference of 
the entire joint posterior distribution of the graph and factor parameters, 
(0,0), or inference of only the conditional independence structure, which 
entails comparing different graphs via the marginal likelihood 

Pr I x} oc / f{x I e, G) p{G) p{9 I G) dO. 

Inference about G may now be viewed as a Bayesian model selection proce- 
dure (see 28, p. 348). 

2. Geometric Graphs 

Most methodology for structural inference in graphical models either as- 
sumes little prior structure on graph space, or else represents graphs using 
high dimensional discrete spaces with no obvious geometry or metric. In ei- 
ther case prior elicitation and posterior sampling can be challenging. In this 
section we propose parametrizations of graph space that will be used in Sec- 
tion 3 to specify flexible prior distributions and to construct new Metropolis/ 
Hastings MCMC algorithms with local and global moves. The key idea for 
this parametrization is to construct graphs and hypergraphs from intersec- 
tions of convex sets in W^. 

We illustrate the approach with an example. Fix a convex region A CM.'^ 
and let V C j4 be a finite set of n points. For each number r > 0, the 
proximity graph Prox(V, r) (see Figure 1) is formed by joining every pair of 
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(unordered) elements in V whose distance is 2r or less, i.e., whose closed balls 
of radius r intersect. As r ranges from to half the diameter of A, the graph 
Prox(V, r) ranges from the totally disconnected graph to the complete graph. 
This example is a particular case of a more general construction illustrated 
in Figure 2; hypergraphs can be computed from properties of intersections of 
classes of convex subsets in Euclidean space. The convex sets we consider are 
subsets of that are simple to parametrize and compute. The key concept 
in our construction is the nerve: 

Definition 2.1 (Nerve). Let F = {Aj, j G 1} be a finite collection of dis- 
tinct nonempty convex sets. The nerve of F is given by 

Nrv(F) = |a C / : p| / 0| . 

The nerve of a family of sets uniquely determines a hypergraph. We use 
the following three nerves in this paper to construct hypergraphs (for more 
details, see 9). 

Definition 2.2 (Cech Complex). Let V be a finite set of points in and r > 
0. Denote by B'^ the closed unit ball in M'^. The Cech complex corresponding 
to V and r is the nerve of the sets B^^r = v + rB'^, G V. This is denoted 
by Nrv(V, r, Cech). 

Definition 2.3 (Delaunay Triangulation). Let V be a finite set of points in 
R*^. The Delaunay triangulation corresponding to V is the nerve of the sets 
Ct, = G M"^ : ||rE — v\\ < \\x — u\\, n G v| for i; G V. This is denoted by 
Nrv(V, Delaunay), and the sets are called Voronoi cells. 

Definition 2.4 (Alpha Complex). Let V be a finite set of points in and 
r > 0. The Alpha complex corresponding to V and r is the nerve of the sets 
By^r n Cv, V gV. This is denoted by Nrv(V, r. Alpha). 

The nerve of a family of sets is a particular class of hypergraphs known 
as (abstract) simplicial complexes. 
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Definition 2.5 (Abstract simplicial complex). Let V be a finite set. A 
simplicial complex witli base set V is a family /C of subsets of V such that 
T € IC and a t implies a £ IC. The elements of /C are called simplices, and 
the number of connected components of IC is denoted tJ(A^)- 

The nerve of a collection of sets is always a hypergraph; in simple cases, 
only vertex pairs arise so the 1-skeleton determines a unique graph. 

Definition 2.6 (p-skeleton) . Let /C be a simplicial complex, and denote by 
\t\ the cardinality of a simplex t £ IC. The p-skeleton of IC is the collection 
of all r € /C such that |r| < p + 1. The elements of the p-skeleton are called 
p-simplices and the 1-skeleton is just a graph (more precisely, it is V U iS for 
a uniquely determined graph Q = {V,£)). 

The 1-skeleton of a nerve is the graph obtained by considering only 
nonempty pairwise intersections. The process of obtaining the nerve and 
the 1-skeleton from a family of sets is illustrated in Figure 3. Different fam- 
ilies of convex sets in induce different restrictions in graph space: for the 
Delaunay triangulation and the Alpha complex, for example, clique sizes 
cannot exceed d + 1. Although no such blanket restriction applies to the 
Cech complex, for this complex some graphs are still unattainable — for ex- 
ample, no Cech complex can include a star graph whose central node has 
degree higher than the "kissing number," i.e., maximal number of disjoint 
unit hyperspheres touching a given hypersphere, 6 for d = 2, 12 for d = 3, 
etc. 

The Cech and Alpha complexes are hypergraphs indexed by a finite 
set V = {Vi, . . . ,Vn} C M'^ and a size parameter r > 0. Each induces a 
parametrization on the space of hypergraphs (V,r) ^ 7i{V,r). The class 
A of convex sets used to compute the nerve determines the space of hyper- 
graphs. To keep the notation simple, A will be implicit whenever obvious by 
the context. We will use ^(V, r) to denote a generic element of A for either 
the Cech or the Alpha complex. Similarly, 1-skeletons of nerves induce a 
parametrization of the spaces of graphs (V, r) Q(y,r). 
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Two principal advantages of this approach are: 

1. For each family of convex sets {A}, the number of parameters needed 
to specify the graph Q or hypergraph 7i grows only linearly with the 
number of vertices; 

2. The hypergraph parameter space will be a subset of M"', a very conve- 
nient parameter space for MCMC sampling. 

2. 1 . Example 

Here we use a junction tree factorization with each univariate marginal Xi 
associated to a point Vi ^M.'^ (the standard graphical models approach). In 
this case, specifying the class of sets to compute the nerve and the value 
for r determines a factorization for the joint density of {^i, . . . ,Xn}- We 
illustrate with n = 5 points in Euclidean space of dimension d = 2. 

Let (Xi, X2, X3, X4, X5) G be a random vector with density f{x) and 
consider the vertex set displayed in Table 1 (also shown as solid dots in 
Figures 4 and 5). For an Alpha complex with r = 0.5 the junction tree 



Coordinate 


Vi 


V2 




Vi 


V, 


X 


0.2065 


0.6383 


0.9225 


-0.8863 


0.3043 


y 


0.3149 


-0.1193 


-0.2544 


0.0816 


-0.9310 



Table 1 

Vertex set used for generating a factorization based on nerves. 



factorization (1.1b) corresponding to the graph in Figure 4 is 

^ V'i2(2;i,a;2)^235(a;2,X3,X5)V'4(a;4) 
V'2(a;2) 

we will denote the factorization as [1, 2] [2, 3, 5] [4] . In the case where the fac- 
tors are potential functions rather than marginals we will use {•} instead 
of [•]. Similarly, for the Cech complex and r = 0.7 the factorization corre- 
sponding to the graph in Figure 5 is 

f, . _ ^'1235(3^1, 3^2, 3^3i 3:5)-0i4(xi, X4) 
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2.2. Filtrations and Decompos ability 

A filtration is a sequence of simplicial complexes that properly include their 
predecessors: 

Definition 2.7 (Filtration). A filtration for a simplicial complex /C is a 
sequence ^ = ^JC^,JC^, . . . ,IC^^ of simplicial complexes such that 

= /C° C /C^ C . . . C /C'^ = /C, 

with all inclusions proper. 

Filtrations are commonly used in computational geometry and topology 
to construct efficient algorithms for computing specific nerves including the 
Alpha complex (11). The simplicial complexes constructed in Section 2 from 
families of convex sets lead to filtrations as the convex sets are enlarged, by 
increasing the radius parameter r. 

Although much of the graphical models literature focuses on Markov 
structures derived from weakly decomposable graphs, those constructed in 
Section 2 from 1-skeletons of Cech and Alpha complexes need not be weakly 
decomposable (see Figure 6). In Algorithm 1 we present an adaptation of 
this construction that generates weakly decomposable graphs, for use in ap- 
plications that require them. In Sections 4 and 5 we present methodology 
and examples for both weakly decomposable and unrestricted model spaces. 

The central idea for generating decomposable graphs from filtrations is to 
note that the complex ^(V, 0) for r = is disconnected and hence weakly 
decomposable; as the radius r increases, if one adds edges only if the resulting 
graph is weakly decomposable, then decomposability will hold for all r > 0. 
This procedure is formalized in Algorithm 1. 

Proposition 2.1. The graph Q produced by Algorithm 1 is weakly decom- 
posable. 

Proof. The algorithm is initialized with the weakly decomposable empty 
graph, and weak decomposability is tested with each proposed addition of 
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Algorithm 1: The algorithm takes as input a filtration of k abstract 
complexes and returns a decomposable graph that is a subset of the 
1-skeleton of the k-th. complex. 

input : a filtration .if ^ {lC° , K.^ , . . . , K.''} 
return: a weakly decomposable graph Q 

■m — 0; i — 0; Qo = 0; 
while i < k and K.'+^ / do 

T"i = G K.^^^ \1C such that jftj = 2|; // the edges in the set difference 

and denote r^.s as the sth edge 
if Ti 7^ then 

^• = 1^.1; 

for s = 1 to P do 

Q' — Qm U Ti,s ; // propose adding the edge 

if tt(^/') < tt(^/m) // fewer connected components? 

then 

Qm+i = Q' \ / / accept the proposal 
_ m = m + 1; 
else 

[ci] = "^{Qm) ; // the cliques 
[si] = .y{Qm) ; // the separators 
[ui,U2] = Ti,s ; // the proposed edge 
if 3i 7^ i, k with v\ £ Ci,V2 & Cj and Ci P| Cj = Sk then 

Qm+i = G' ; // accept the proposal 
_ m = m + 1; 

j = i + 1; 

Q ~ Qm 
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an edge {i.e., a 1-simplex in ^). The decomposability test is taken from 
(12, Theorem 2). Since only finitely many edges may possibly be added, G 
is weakly decomposable by construction. ■ 

2.2.1. Example 

We first illustrate the algorithm on a simple example, based on the five 
points in shown in Figure 7 and given in Table 2. The graph induced by 
a Cech complex with r = 0.5 will not be decomposable. Table 3 presents the 
evolution of cliques and separators with increasing r, as edges are proposed 
for inclusion in Algorithm 1. The first three proposed edge additions are 
accepted, but the proposal to add edge (1, 2) at radius r = 0.474 is rejected, 
since the intersection of prime components {1, 3} and {2, 4, 5} is empty, and 
therefore not a separator. 



Coordinate 




V2 


V3 


Vi 


Vs 


X 


0.686 


0.214 


0.846 


0.411 


0.089 


y 


0.151 


0.194 


0.420 


0.567 


0.553 



Table 2 

Vertex set used to illustrate Algorithm 1. 



Cliques 


Separators 


r 


Update 


[1][2][3][4][5] 









[1,3] [2] [4] [5] 




0.313 


(1,3) 


[1,3] [2] [4, 5] 




0.321 


(4,5) 


[1,31[2,5][4,5] 


[5] 


0.379 


(2,5) 


[1,3] [2, 4, 5] 




0.421 


(2,4) 


[1,3][3,4][2,4,5] 


[3] [4] 


0.459 


(3,4) 


[1,3][3,4][2,4,5] 


[3] [4] 


0.474 


~(1,2) 


[1,3, 4] [2, 4, 5] 


[4] 


0.498 


(1,4) 



Table 3 

Evolution of cliques and separators in the junction tree representation of Q as edges are 
added according to Algorithm 1. The proposed addition of edge (1,2) is rejected. 



imsart-generic ver. 2009/08/13 file: filt.tex date: December 18, 2009 



S. Lunagomez et al. /Conditional Independence Models 14 
2.2.2. Algorithm deletes few edges 

It is interesting to note that the number of proposed edge additions that 
are rejected by the algorithm is typically quite small. To illustrate this we 
applied Algorithm 1 to a filtration of Cech complexes corresponding to 100 
points sampled uniformly from the unit square, with radius r = 0.05. In 
Figure 8 the graph Q output by the algorithm is compared to the 1-skeleton 
of the Cech complex (with no decomposability restriction) with the same 
radius r = 0.05. Few edges appear in the Cech complex but not in Q. This 
occurs because geometric graphs tend to be triangulated, in the sense that 
if edges (fi,f2) and {v2,V3) belong to a geometric graph, then very likely 
the edge (^1,^3) will also be in the graph, preserving decomposability. 

3. Random Geometric Graphs 

In Section 2 we demonstrated how the geometry of a set V of n points in 
R'^ can be used to induce a graph Q. In this section we explore the relation 
between prior distributions on random sets V of points in M.'^ and features of 
the induced distribution on graphs Q, with the goal of learning how to tailor 
a point process model to obtain graph distributions with desired features. 

Definition 3.1 (Random Geometric Graph). Fix integers n, (i G N and let 
V = (Vi, . . . , Vn) be drawn from a probability distribution Q on (M*^)". For 
any class A of convex sets in and radius r > 0, the graph Q(V,r,A) is 
said to be a Random Geometric Graph (RGG). 

While Definition 3.1 is more general than that of (25, p. 2), it still cannot 
describe all the random graphs discussed in (26) (for example, those based 
on fc-neighbors cannot in general be generated by nerves). For A we will 
use closed balls in Mf^ or intersections of balls and Voronoi cells; most often 
Q will be a product measure under which the {Vi} will be n independent 
identically distributed (iid) draws from some marginal distribution Qnj on 
W^, such as the uniform distribution on the unit cube [0, l]'^ or unit ball B*^, 
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but we will also explore the use of repulsive processes for V under which the 
points {Vi} are more widely dispersed than under independence. It is clear 
that different choices for A, Q and r will have an impact on the support of 
the induced RGG distribution. To make this notion precise we define feasible 
graphs. First, 

Definition 3.2 (Isomorphic). Write Qi = Q2 for two graphs Qi = {Vi,£i) 
and call the graphs isomorphic if there is a 1:1 mapping X • ^1 ^ ^2 such 
that {vi,Vj) e £1 <^ {x{vi),x{vj)) G £2 for ah Vi,Vj G Vi. 



Definition 3.3 (Feasible Graph). Fix numbers d, n G N, a class A of convex 
sets in M'^, and a distribution Q on the random vectors V in (M*^)". A graph 
r is said to be feasible if for some number r > 0, 



In contrast to Erdos-Renyi models, where the inclusion of graph edges are 
independent events, the RGG models exhibit edge dependence that depends 
on the metric structure of Mf^ and the class A of convex sets used to construct 
the nerves. 

There is an extensive literature describing asymptotic distributions for a 
variety of graph features such as: subgraph counts, vertex degree, order of 
the largest clique, and maximum vertex degree (for an encyclopedic account 
of results for the important case of 1-skeletons of Gech complexes, see 25). 
Several results for the Delaunay triangulation, some of which generalize to 
the Alpha complex, are reported in (26). 

(author?) (25, Chap. 3) gives conditions which guarantee the asymptotic 
normality of the joint distribution of the numbers Qj of j-simplices (edges, 
triads, etc.), for iid samples V = (Vi, . . . , Vn) from some marginal distribu- 
tion Qm on M'^, as the number n = |V| of vertices grows and the radius r„ 
shrinks. 

Simulation studies suggest that the asymptotic results apply approxi- 
mately for n > 24-100. 



PT{g{V,r,A) ^ F} > 0. 
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3.1. Simulation Study of Subgraph Counts for RGGs 

In this subsection we study the distribution of particular graph features as a 
function of the samphng distribution of the random point set V and contrast 
this with Erdos-Renyi models. Specifically we will focus on the number of 
edges (2-cliques) Q2 and the number of 3-cliques Q^. 

The two spatial processes we study for Q are iid uniform draws from 
the unit square [0, 1]^ in the plane, and dependent draws from the Matern 
type III hard-core repulsive process (18), using Cech complexes with radius 
r = I/VTSO ~ 0.082 in both cases to ensure asymptotic normality (25, 
Thm. 3.13). In our simulations we vary both the number of variables (graph 
size) n and the Matern III hard core radius p. Comparisons are made with 
an Erdos-Renyi model with a common edge inclusion parameter. Table 4 
displays the quartiles for Q2 and Qj, as a function of the graph size n, hard 
core radius and Erdos-Renyi edge inclusion probability p. Figures 9, 10, 
and 11 show the joint distribution of {Q2, Q3) for {V-i} ~ Un([0, 1]^), for a 
Matern III process with hard core radius p = 0.35, and for draws from an 
Erdos-Renyi model with inclusion probability p = 0.065, respectively. 

These simulations illustrate that by varying the distribution Q we can 
control the joint distribution of graph features. The repulsive and iid uniform 
distribution have very similar edge distributions, for example (see Figures 9 
and 10), while (as anticipated) the repulsive process penalizes large cliques. 
Joint control of these features is not possible with an Erdos-Renyi model 
with a common edge inclusion probability and it is not obvious how to 
encode this type of information in the concordance function approach of 
(author?) (22). 

In Section 2.2 we proposed a procedure for generating decomposable 
graphs, and noted that the graphs induced by this algorithm are similar 
to those constructed without the decomposability restriction. In Figure 12 
we display a simulation study of the distribution of edge counts for a RGG 
and the restriction to decomposable graphs. These distributions are very 
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Graph 


|V| 


25% 


Edges 
50% 


75% 


25% 


3-Cliques 
50% 


75% 


Uniform 


75 


161 


171 


182 


134 


160 


190 


Matern (0.035) 


75 


154 


161 


170 


110 


124 


144 


ER (0.050) 


75 


130 


138 


146 


6 


8 


11 


ER (0.065) 


75 


172 


181 


189 


14 


18 


22 


Uniform 


50 


69 


75 


81 


34 


43 


57 


Matern (0.035) 


50 


66 


71 


76 


27 


35 


43 


Matern (0.050) 


50 


62 


67 


71 


22 


27 


33 


ER (0.050) 


50 


56 


61 


67 


1 


2 


4 


ER (0.065) 


50 


74 


79 


85 


3 


5 


7 


Uniform 


20 


9 


12 


14 


1 


2 


4 


Matern (0.035) 


20 


9 


11 


13 


1 


1 


3 


Matern (0.050) 


20 


8 


10 


12 





1 


2 


ER (0.050) 


20 


8 


9 


11 











ER (0.065) 


20 


10 


12 


15 








1 



Table 4 

Summaries of the empirical distribution of edge and 3-clique counts for Cech complex 
random geometric graphs with radius r = 0.082, for vertex sets sampled from iid draws 
from the unit square from: a uniform distribution, a hard core process with radius 
p — 0.035, and from Erdds-Renyi (ER) with common edge inclusion probabilities of 

p = 0.050 andp = 0.065. 



similar. 



4. Model specification 
4.1. General Setting 

We offer a Bayesian approach to the problem of inferring factorizations of 
distributions of the forms of (1.1), 

Y\a&.9'(g) i^aiXa \ Ga) 



f{x) = Jl (j)a{Xa I 6'a) or 

aeUs) Hfee-y (e) M^b I e^) 

In each case we specify the prior density function as a product 

p{e,G)=p{e\G)p{G) (4.1) 

of a conditional hyper Markov law for ^ G and a marginal RGG law on Q. 
We use conventional methods to select the specific hyper Markov distribution 
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(hyper Inverse Wishart for multivariate normal sampling distributions, for 
example) since our principal focus is on prior distributions for the graphs, 
p{G)- We also present MCMC algorithms for sampling from the posterior 
distribution on G, for observed data. 

4.2. Prior Specification 

All the graphs in our statistical models are built from nerves constructed in 
Section 2 from a random vertex set V = C M*^ and radius r > 0. 

Since the nerve construction is invariant under rigid transformations of V or 
simultaneous scale changes in V and r, restricting the support of the prior 
distribution on V to the unit ball B'^ does not reduce the model space: 

Proposition 4.1. Every feasible graph in'K'^ may be represented in the form 
Q{V,r,A) for a collection V of n points in the unit ball M'^ and for r = ^. 

Proof. Let G = {V,£) = G{V,r,A) be a feasible graph with |V| = n vertices. 
Every edge {vi,Vj) £ £ has length dist {vi,Vj) < 2r so, by the triangle 
inequality, every connected component Tj of Q with nj vertices must have 
diameter no greater than the longest possible path length, 2r(nj — 1), and 
so fits in a ball Bi of diameter 2r(nj — 1). The union of these balls, centered 
on a line segment and separated by r(2 + 1/n), will have diameter less than 
r(2n — 1). By translation and linear rescaling we may take r = 1/n and 
bound the diameter by 2, completing the proof. ■ 

We fix r = ;i and simplify the notation by writing Q{V,A) instead of 
G(y,r,A) for A = Cech or ^ = Alpha or, if A is understood, simply G{V). 
Thus we can induce prior distributions on the space of feasible graphs from 
distributions on configurations of n points in the unit ball in M'^. 

For iid uniform draws V = {Vi, . . . , Vn) from B*^, the expected number 
of edges of the graph Q(V,r,A) is bounded above by E.[#£] < (^){2r)'^; for 
r = i in dimension d = 2 this is less than E[^£] < 2, leading to relatively 
sparse graphs. We often take larger values of r (still small enough for empty 
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graphs to be feasible), to generate richer classes of graphs. A limit to how 
large r may be is given by the partial converse to Prop. 4.1, 

Proposition 4.2. The empty graph on n vertices cannot he expressed as 
g{V,r, Cech) for any V <ZM'^ with r > {n^/'^ - 

Proof. Let V = {Vi, . . . , V^} C B"^ be a set of points and r > a radius such 
that Qiy, r, Cech) is the empty graph. Then the balls Vi + rM'^ are disjoint 
and their union with d-dimensional volume nuj^r'^ lies wholly within the ball 
(1 + r)'E'^ of volume ^^(l + r)'^ (where ujd = n^^"^ /r{l+d/2) is the volume of 
the unit ball), so n < (1 + i)*^. ■ 

Slightly stronger, the empty graph may not be attained as ^(V, r, Cech) 
for any r > l/[(ri/prf)^/'^ — 1] where pd is the maximum spherical packing 
density in W^. For d = 2, this gives the asymptotically sharp bound r < 
l/ y^n^/vr - 1 

4-3. Sampling from Prior and Posterior Distributions 

Let Q be a probability distribution on n-tuples in W^, p{Q) the induced 
prior distribution on graphs ^(V, Cech) for V Q with r = and let 
p{0 I ^) be a conventional hyper Markov law (see below). We wish to draw 
samples from the prior distribution p{9,Q) of (4.1) and from the posterior 
distribution p{9,G \ x), given a vector x = (xi, x^v) of iid observations 
Xj ~ /(x I 6), using the Metropolis/Hastings approach to MCMC (15; 29, 
Ch. 7). 

We begin with a random walk proposal distribution in B'^ starting at an 
arbitrary point f G B'^, that approximates the steps 

a diffusion V^^^ on B'^ with uniform stationary distribution and reflecting 
boundary conditions at the unit sphere dM'^. 

The random walk is conveniently parametrized in spherical coordinates 
with radius p^^^ = \ V^^^\ and Euler angles — in d=2 dimensions, angle (f^^^ — 
at step t. Informally, we take independent radial random walk steps such that 
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is reflecting Brownian motion on the unit interval (this ensures that 
the stationary distribution will be Un(B'^)) and, conditional on the radius, 
angular steps from Brownian motion on the d-sphere of radius p*-*^. 

Fix some rj > 0. In d = 2 dimensions the reflecting random walk proposal 
{p*,{p*) we used for step {t + 1), beginning at {p^^\ip^^^), is: 

for iid standard normal random variables | ^p*^ , | , where 

R{x) = |x-2[i(x + l)J 
is X reflected (as many times as necessary) to the unit interval. Similar 



i/d 



expressions work in any dimension d, with p* = Ry[p^ '] + Cp ^^^d 
appropriate step sizes for the {d — 1) Euler angles. 

For small > this diffusion-inspired random walk generates local moves 
under which the proposed new point (p*, (p*) is quite close to {p^^\ f^^^) with 
high probability. To help escape local modes, and to simplify the proof of 
ergodicity below, we add the option of more dramatic "global" moves by 
introducing at each time step a small probability of replacing {p^^\ <f^^^) with 
a random draw (p*, ip*) from the uniform distribution on B*^ (see Figure 13). 
Let q{V* I V) denote the Lebesgue density at V* G (B'^)" of one step of this 
hybrid random walk for V = (Vi, . . . , Vn), starting at V G (B*^)*^. 



4-3.1. Prior Sampling 

To draw sample graphs from the prior distribution begin with V^"^ ~ Q{d\') 
and, after each time step t > 0, propose a new move to V* (?(V* | V^*^). 
The proposed move from V(*) (with induced graph a*^*^ = ^(V^*))) to V* 
(and G*) is accepted (whereupon V^*"*^^) = V*) with probability 1 A H^^\ 
the minimum of one and the Metropolis/Hastings ratio 

^(,) ^ p{\*) g(vW I V--) 
p(V(*)) q{V* I V(*))' 
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Otherwise V^*"^-'^) = V^*); in either case set t <— t+1 and repeat. Note the 
proposal distribution q{- \ ■) leaves the uniform distribution invariant, so 
= 1 for Q{dY) oc dV and in that case every proposal is accepted. 

4-3.2. Posterior Sampling 

After observing a random sample X = x = (xi, . . . , xn) from the distribu- 
tion Xj ~ /(x I 0,Q), let 

N 

f{^\e,g) = l[f{x,\e,g) 

i=l 

denote the likelihood function and 

M{g) = f f{^\e,g)p{e \ g)de (4.2) 

the marginal likelihood for g. For posterior sampling of graphs, a proposed 
move from V^*) to V* is accepted with probability 1 A -fT^*-* for 

rjit) ^ M{g*) p{V*) g(vW I V*) 

A^(gW) p(VW) q(V* I VW)' ^ ■ ' 

For multivariate normal data X and hyper inverse Wishart hyper Markov 
law p{9 I g), M{g) from (4.2) can be expressed in closed form for decompos- 
able graphs ^(V). efficient algorithms for evaluating (4.2) are still available 
even if this condition fails. 

The model will typically be of variable dimension, since the parameter 
space Qg for the factors may depend on the graph g = t/(V). Not all 
proposed moves of the point configuration V^*) V* will lead to a change 
in ^(V); for those that do we implement reversible-jump MCMC (13; 32) 
using the auxiliary variable approach of (author?) (4) to simplify the book- 
keeping needed for non-nested moves Qg ~^ Qg* . 
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4. 4- Convergence of the Markov chain 

Denote by Q{n,d,A) the finite set of feasible graphs with n vertices in 
M'^, i.e., those generated from 1-skeletons of ^-complexes. For each Q G 
g{n, d, A) let Vg C (B'^)" denote the set of aU points V = {Vi, . . . , T4} G 
(B'^)" for which g ^ g(V, ^,A), and set n{g) = Q{Vg). Then 

Proposition 4.3. The sequence = g(vW,i,^) induced by the prior 
M CMC procedure described in Section 4-3.1 samples each feasible graph g G 
g{n,d,A) with asymptotic frequency iJ,{g) . The posterior procedure described 
in Section 4-3.2 samples each feasible graph with asymptotic frequency fi{g \ 
x), the posterior distribution of g given the data x and hyper Markov prior 
pi0\G)- 

Proof. Both statements follow from the Harris recurrence of the Markov 
chain V^*) constructed in Section 4.3. For this it is enough to find a strictly 
positive lower bound for the probability of transitioning from an arbitrary 
point V G (B"^)" to any open neighborhood of another arbitrary point 
V* E (B'^)" (29, Theorem 6.38, pg. 225). This follows immediately from 
our inclusion of the global move in which all n points {Vi} are replaced with 
uniform draws from (B'^)". 

■ 

It is interesting to note that while the sequence = ^(V^*^ ^,A) is a 
hidden Markov process, it is not itself Markovian on the finite state space 
g{n,d,A); nevertheless it is ergodic, by Prop. 4.3. 

5. Four Simulation Examples 

We illustrate the use of the proposed parametrization in Bayesian inference 
for graphical models: this is done by specifying priors that encourage sparsity 
and the design of MCMC algorithms that allow for local as well as global 
moves. We offer four examples. The first example illustrates that our method 
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works when the graph encoding the Markov structure of underlying density is 
contained in the space of graphs spanned by the nerve used to fit the model. 
In the second example we apply our method to Gaussian Graphical Models. 
The third example shows that the nerve hypergraph (not just the 1-skeleton) 
can be used to induce different groupings in the terms of a factorization, 
and therefore a way to encode dependence features that go beyond pairwise 
relationships. In the fourth example we compare results obtained by using 
different filtrations to induce priors over different spaces of graphs. 

5.1. Example 1: Q is in the Space Generated by A. 

Let {Xi, . . . , Xiq) be a random vector whose distribution has factorization: 

■00 ix4:)'4'e ix8)lp9 ix9)lp9 {Xi , Xiq) 

(5.1a) 

The Markov structure of (5.1a) can be encoded by the geometric graph dis- 
played in Figure 14. We transform variables if necessary to achieve standard 
Un(0, 1) marginal distributions for each Xi, and model clique joint marginals 
with a Clayton copula (6, or 24, §4.6), the exchangeable multivariate model 
with joint distribution function 

^'e(x/) = I l-?i7 + ^x-' 
V iei 

and density function 

(5.1b) 

on [0, 1]"^ for some G = (0, oo), for each clique [vi : i £ I] of size rij. 

We drew 250 samples from model (5.1) with = 4. For inference about 
Q we set A = Alpha and r = 0.30, with independent uniform prior dis- 
tributions for the vertices Vi ~ Un(B^) on the unit ball in the plane. We 
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used the random walk described in Section 4.3 to draw posterior samples 
with Algorithm 1 applied to enforce decomposability. To estimate 6 we take 
a unit Exponential prior distribution 9 ~ Ex(l) and employ a Metropolis/ 
Hastings approach using a symmetric random walk proposal distribution 
with reflecting boundary conditions at = 0, 

with £t ~ Un(— /3,/3) for fixed /3 > 0. We drew 1000 samples after a burn-in 
period of 25 000 draws. The three models with the highest posterior proba- 
bilities are displayed in Table 5. The geometric graphs computed from nine 
posterior samples (one every 100 draws) are shown in Figure 15; note that 
the computed nerves appear to stabilize after a few hundred iterations while 
the actual position of the vertex set continues to vary. 



Graph Topology 


Posterior Probability 


[1, 4, 10] [1, 8, 10] [4, 5] [8, 9] [2, 3, 9] [6] [7] 


0.963 


[1,4, 10] [1,8, 10] [4, 5] [8, 9] [2, 3, 9] [6] [5, 7] 


0.021 


[1,4, 10] [1,8] [4, 5] [8, 9] [2, 3, 9] [6] [7] 


0.010 


Table 5 



The three models with highest estimated posterior probability. The true model is shown in 

bold (see Figure 14). Here 6 = 4:. 



5.2. Example 2: Gaussian graphical model 

We use our procedure to perform model selection for the Gaussian graphical 
model X No(0, T,g), where Q encodes the zeros in We adopt a Hyper 
Inverse Wishart (HIW) prior distribution for S | ^. The marginal likelihood 
(in the parametrization of 1, Eqn (12)) is given by 

AT/9 lG(v)i5 + N,D + X^X) 
M(y) = (2vr)-"^/2 (5.2) 
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where 



Igi6,D) 



|S|(^~2)/2g-i<S,D>^5. 



iM+{g) 

denotes the HIW normahzing constant. This quantity is available in closed 
form for weakly decomposable graphs ^(V), but for our unrestricted graphs 
(5.2) must be approximated via simulation. For our low-dimensional exam- 
ples the method of (1) suffices; for larger numbers of variables we recommend 
that of (5). We set (5 = 3 and D = 0.4/e-|-0.6J6 {Iq and Jq denote the identity 
matrix and the matrix of all ones, respectively). 

We sampled 300 observations from a Multivariate Normal with mean zero 
and precision matrix 



/ 18.18 
-6.55 


2.26 
-6.27 
V 



-6.55 
14.21 


-4.90 








10.47 



-3.65 



2.26 
-4.90 


10.69 





-6.27 




27.26 




\ 


-3.65 



7.41 J 



(5.3) 



whose conditional independence structure is given by the graph in Figure 16. 
We fit the model described in Section 4 using a uniform prior for each Vi G 
and r = 0.25. We employed hybrid random walk proposals in which we 
move all five vertices {Vi} independently according to the diffusion-inspired 
random walk described in Section 4.3 with probability 0.85; replace one 
uniformly selected vertex Vi with a uniform draw from Un(]B^) with prob- 
ability 0.05; and replace all five vertices with independent unoform draws 
from Un(B^) with probability 0.10. We sampled 1000 observations from the 
posterior after a burn in of 750 000. Results are summarized in Table 6 



5.3. Example 3; Factorization Based on Nerves 

While Gaussian joint distributions are determined entirely by the bivariate 
marginals, and so only edges appear in their complete-set factorizations (see 
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Graph Topology 


Posterior Probability 


[1,2, 4] [1,5] [3, 6] 


0.152 


[1,5][2,3,4][2,3,6] 


0.072 


[1,2, 3, 4, 6] [1,5] 


0.069 


[1,4] [2, 4] [2, 3, 6] 


0.055 


[1,2,4][2,3,4][1,5][3,6] 


0.052 



Table 6 

The five models with highest estimated posterior probability. The true model is shown in 

bold. 

(1.1a)); more complex dependencies are possible for other distributions. The 
famihar example of the joint distribution of three Bernoulli variables Xi, X2, 
each with mean 1/2, with Xi and X2 independent but X^ = {Xi — ^2)^ 
(so that {Xi} are only pairwise independent) has only the complete set 
{1,2,3} in its factorization. 

Consider now a model with the graphical structure illustrated in Figure 17 
whose density function, if it is continuous and strictly positive ((see 20, 
Prop. 3.1)), admits the complete-set factorization: 

f{x \Q,0) = eg (p{xi, X2)<l)ixi, Xg)(P{x2, Xq) ■ (p{x3, Xi, X5) . (5.4a) 

For illustration we will take each (/)(•) to be a Clayton copula density (see 
(5.1b)). For simplicity we specify the same value 6 = A for each association 
parameter, so f{x \ G,0) is given by (5.4a) with 

c^{x,y) = 5 (x-4 + y-4_i)-9/4 (^y)-5 (5.4b) 

<P{x, y, z) = 30(2;-^ + y-^ + z-^ - 2)-^^'^{x y z)-\ (5.4c) 

In earlier examples we associated graphical structures {i.e., edge sets) 
with 1-skeletons of nerves. We now associate %pergraphical structures {i.e., 
abstract simplicial complexes that may include higher-order simplexes) with 
the entire nerves, with maximal simplices associated with complete-set fac- 
tors. For example: the Alpha complex computed from the vertex set dis- 
played in Table 7 with r = 0.40 has {3, 4, 5}{1, 2}{1, 6}{2, 6} as its maximal 
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Coordinate 


Vi 


V2 


Vs 


Vi 


V5 


Ve 


X 


-0.0936 


-0.4817 


0.0019 


0.0930 


0.2605 


-0.5028 


y 


0.6340 


0.7876 


0.0055 


0.0351 


-0.0702 


0.2839 



Table 7 

Vertex set used for generating a factorization based on nerves. 



simplices (Figure 18). By associating a Clayton copula to each of these hy- 
peredges we recover the model shown in (5.4). 

We use the same prior and proposal distributions constructed in Sec- 
tion 4.3 from point distributions in M'^; what has changed is the way the 
nerve is being used: as a hypergraph whose maximal hyperedges represent 
factors. One complicating factor is the need to evaluate the normalizing fac- 
tor cg for each graph Q we encounter during the simulation; unavailable in 
closed form, we use Monte Carlo importance sampling to evaluate cg for 
each new graph, and store the result to be reused when Q recurs. 

We anticipate that uniform draws Vi ~ Un(]B^) will give high probability 
to clusters of three or more points within a ball of radius r, favoring higher- 
dimensional features (triangles and tetrahedra) in the induced hypergraph 
encoding the Markov structure of {-'^i}. To explore this phenomenon, we 
compare results for uniform draws with those from a repulsive process under 
which clusters of three or more points are unlikely to lie within a ball of 
radius r, hence favoring hypergraphs with only edges. 

We began by sampling 650 observations from model (5.4) with A = Alpha 
and r = 0.40, with independent uniform prior distributions for the vertices 
Vi ~ Un(B^) on the unit ball in the plane. The Metropolis/Hastings propos- 
als for the vertex set are given by a mixture scheme: 

• A random walk for each Vi as described in Section 4.3, with step size 
r] = 0.020. This proposal is picked with probability 0.94 

• An integer 1 < A; < 6 is chosen uniformly and, given k, a subset of size k 
from {1,2,3,4,5,6} is sampled uniformly; the vertices corresponding 
to those indices are replaced with random independent draws from 
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Un(B^). This proposal is picked with probabihty 0.06, 0.01 for each k. 

For we used the same standard exponential prior distribution and reflecting 
uniform random walk proposals described in Example 5.1. 

Using 5 000 posterior samples after a burn-in period of 95 000 iterations, 
the models with highest posterior probability are summarized in Table 8. 



Maximal Simplices 


Posterior Probability 


{3,4,5}{1,2}{2,6}{1,6} 


0.609 


{1,2,6}{3,4}{4,5}{3,5} 


0.161 


{3,5}{1,6}{3,4}{1,2}{2,6} 


0.137 



Table 8 

Highest posterior factorizations with uniform, prior for model of (5.4) and Figure 17 

(true model is shown in bold). 



To penalize higher-order simplexes we used a Strauss repulsive process 
(33) conditioned to have n points in B'^ as prior distribution for the vertex 
set, with Lebesgue density 

g{v) OC 7#'t(*j)- <iist{v„v,)<2R} 

for some < 7 < 1, penalizing each pair of points closer than 2R. Simulation 
results for this prior (with R = 0.7r and 7 = 0.75) are summarized in 
Table 9. The posterior mode is far more distinct for this prior than for the 
uniform prior shown in Table 8. 



Maximal Simplices 


Posterior Probability 


{3,4,5}{1,2}{2,6}{1,6} 


0.824 


{1,2,6}{3,4,5} 


0.111 


{1,2,6}{3,4}{3,5}{4,5} 


0.002 



Table 9 

Highest posterior factorizations with Strauss prior (true model is shown in bold). 

In a further experiment with 7 = 0.35, the posterior was concentrated on 
factorizations without any triads. 
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In the simulation studies of Sections 5.1-5.3 the class of sets A used to 
compute the nerve was known. In this example we investigate the behavior 
of our methodology when the class of convex sets used to fit the model differs 
from that used to generate the true graph. We consider three possibilities: 
A = Alpha in M^, ^ = Alpha in and A = Cech in M^. We performed 
two experiments: one when the graph is feasible for each the three classes, 
and another example where the graph could be generated by only two of the 
classes. 

First consider a model with junction tree factorization: 

/ex = -^—^ , (5-5 

whose conditional independence structure given by the graph of Figure 19. 
Again, the clique marginals are specified as a Clayton copula with 6 = 4. 
We simulated 300 samples from this distribution. 

We fitted the model with each of the three classes of convex sets using the 
Metropolis Hastings algorithm of Section 4.3 with random walk proposals 
on B"' (where d = 2 or 3, depending on A). Algorithm 1 was used to enforce 
decomposability, using r = 0.40 and r] = 0.020. The same exponential prior 
and uniform reflecting random- walk proposals for 9 were used as in Example 
5.1. Results of 1000 samples after a burn-in period of 50 000 draws are 
summarized in Table 10. Not surprisingly, the posterior mode coincided with 
the true model in all three cases. 

The second model we considered has junction tree factorization: 

X2, X4 )-0e(xi, X3, a:4) -0e(xi, X4, xs) 

The corresponding graph cannot be obtained from an Alpha complex in 
R^, but it is feasible for an Alpha complex in M'^ (Figure 20) or a Cech 
complex in M^. Using the same Clayton clique marginals before, we sampled 
300 observations from this distribution and fitted the model using the three 



imsart-generic ver. 2009/08/13 file: filt.tex date: December 18, 2009 



S. Lunagomez et al. /Conditional Independence Models 



30 



Nerve 


HPP Models 


Posterior 


Alpha in 


[1,3][2,3,4][5] 


0.964 




[1,3] [2, 3, 4] [1,5] 


0.012 




[1,3, 4] [2, 3, 4] [5] 


0.012 


Alpha in 


[1,3][2,3,4][5] 


0.982 




[1,3][2,3][3,4][5] 


0.011 




[1][2,3,4][5] 


0.003 


Cech in 


[1,3][2,3,4][5] 


0.595 




[1,2, 3, 4] [5] 


0.179 




[1,2,3,4,5] 


0.168 



Table 10 



Models with highest posterior probability. The table is divided according to the class of 
convex sets used when fitting the model. The true model is shown in bold. 



classes of convex sets. Results from 1 000 samples after a burn-in period of 
75 000 are summarized in Table 11. Observe that for Alpha complexes in M^, 
there is no clear posterior mode (unlike the previous example, or Sections 
5.1 and 5.3). The posterior mode for the Cech complex in and the Alpha 
complex in M'^ both match the true model. 



Nerve 


HPP Models 


Posterior 


Alpha in 


[1,2][1,3,4][1,4,5] 


0.214 




[1,2,4] [1,3,4] [1,3,5] 


0.115 




[1,2,4] [1,3,4] [3,4,5] 


0.112 


Alpha in R^ 


[1,2,4] [1,3,4] [1,4,5] 


0.976 




[1,2,3,4] [1,4,5] 


0.016 




[1,2,4] [1,3] [1,4,5] 


0.009 


Cech in R^ 


[1,2,4] [1,3,4] [1,4,5] 


0.758 




[1,2,4] [1,3,4] [1,3,5] 


0.177 




[1,2,4] [1,3,4] [4,5] 


0.148 



Table 11 

Models with highest posterior probability, for each class of convex sets. The true model 
(shown in bold) is unattainable for Alpha complexes in R^. 



6. Discussion 

In this article we present a new parametrization of graphs by associating 
them with finite sets of points in R'^. This perspective supports the design 
of informative prior distributions on graphs using familiar probability dis- 
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tributions of point sets in W^. It also induces new and useful Metropolis/ 
Hastings proposal distributions in graph space that include both local and 
global moves. As suggested by Helly's Theorem (10) characterizing the spar- 
sity of intersections of convex sets in W^, this methodology is particularly 
well suited for sparse graphs. The simple strategies presented here gener- 
alize easily to more detailed and subtle models for both priors and M/H 
proposals. 

An interesting feature of our approach is that the distribution on the 
space of graphs is modeled directly before the application of any specific 
hyper Markov law, in contrast to standard approaches in which it is the 
hyper Markov law that is used to encourage sparsity or other features on 
the graph. We think that working with the space of graphs explicitly opens 
a lot of possibilities for prior specification in graphical models, therefore, it 
is a perspective worth further study. 

Interesting questions and extensions of this idea include: (1) achieving a 
deeper and more detailed understanding of the subspace of graphs spanned 
by different specific filtrations; (2) designing priors to control the distribu- 
tions of specific features of graphs such as clique size or tree width; (3) mod- 
eling directed acyclic graphs (DAGs), and (4) concrete implementation of 
novel Markov structures based on nerves. 

This methodology generates only graphs that are feasible for the particu- 
lar filtration chosen. Although we do have some insight about which graphs 
can and cannot be generated by a specific filtrations, a more complete and 
formal understanding of this aspect of the problem would be useful. 

We used very simple prior distributions for the purpose of illustrating the 
core idea of the methodology. It is natural in this approach to incorporate 
tools from point processes into graphical models to define new classes of 
priors for graph space. Future developments in our research will involve a 
range of repulsive and cluster processes. 

The parametrization we propose can be used to represent Markov struc- 
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tures on DAGs, but the strategies for obtaining such graphs from nerves 
will be different and will establish stronger connections between Graphical 
Models and computational topology. 

The present work is related to that of (author?) (27) in which a nerve 
of convex subsets in is used to obtain Markov structures for a distri- 
bution, an extension of the abstract tube theory of (author?) (23). This 
new perspective allows for constructions that generalize the idea of junction 
trees. By modifying our methodology according to this framework (personal 
communication from H. Wynn suggests that this is feasible) we hope to fit 
models that factorize according to those novel Markov structures. 

Another possible extension of this work is to discretize the set from which 
the vertex set is sampled {e.g., use a grid). Such discretization may improve 
the behaviour of the MCMC; it would also allow the use of a nonparametric 
form for the prior on the vertex set, leading to more flexible priors on graph 
space. 
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Fig 1. Proximity graph for 100 vertices and radius r — 0.05. 
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Fig 2. (a) A set of vertices in are used to construct a family of disks of radius r = 0.5. 
(b) The nerve of this convex set. This is an example of a Cech complex, (c) For the same 
vertex set the Voronoi diagram is computed, (d) The nerve of the Voronoi cells is obtained. 
This is an example of the Delaunay triangulation. Note that the maximum clique size of 
the Delaunay is bounded by the dimension of the space of the vertex set plus one; such a 
restriction does not apply to the Cech complex. 
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Fig 3. (a) Given a set of vertices and a radius (r — one can compute Ai = d 11 Bi, 
where d is the Voronoi cell for vertex i and Bi is the hall of radius r centered at vertex 
i. (b) The Alpha complex is the nerve of the Ai 's. (c) Often the main interest will be the 
1-skeleton of the complex, which is the subset of the nerve that corresponds to (nonempty) 
pairwise intersections. 
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(a) (b) (c) 




Fig 7. (a) Proximity graph computed from the vertex set given in Table 2. (h) The decom- 
posable graph computed from the same vertex set using Algorithm 1. The edge (1,2) is not 
included in the decomposable graph. 
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Fig 8. (a) The 1-Skeleton of Cech complex given the displayed point set and r = 0.05. (b) 
The decomposable graph for the same complex, point set, and radius output by Algorithm 
1. 




140 160 180 200 220 

Edge Count 



240 



Fig 9. Edge counts and 3-Clique counts from 2,500 simulated samples of 
g(V, 1/V2~75, Cech) where |V| = 75 and V, ~ Un{[0, if), 1 < i < 75. The multivariate 
normal appears as a reasonable approximation for the joint distribution, as suggested by 
(25, Thm. 3.13). Cech radius is r-n = l/\/2n. 



imsart-generic ver. 2009/08/13 file: filt.tex date: December 18, 2009 



S. Lunagomez et al. /Conditional Independence Models 



43 




Fig 10. Edge counts and 3-Clique counts from 2,500 simulated samples of 
G{V, l/\/2 • 75, Cech) where |V| = 75 and V sampled from a Mattern III with hard-core 
radius r — 0.35. 




Fig 11. Edge counts and 3-Chque counts from 1000 simulated samples of an Erdos-Renyi 
graph with edge inclusion probability of p — 0.065. 
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(a) (b) 




-1.5 -1 -0.5 0.5 1 1.5 -1.5 -1 -0.5 0.5 1 1.5 



X X 



(c) (d) 




-1 .5 -1 -0.5 0.5 1 1 .5 -1 .5 -1 -0.5 0.5 1 1 .5 



X X 

Fig 13. This figure illustrates a local move, (a) The current configuration of the points, (h) 
The graph implied by this configuration, (c) The proposal configuration which is obtained 
by randomly moving one vertex, (d) The graph implied by the proposed move. 
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Fig 14. Geometric graph representing the model given in (5.1a). For this example graphs 
are constructed to he decomposable and the clique marginals are specified as Clayton cop- 
ulas. 
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Fig 15. Geometric graphs corresponding to snapshots of posterior samples (one every 100 
iterations) from model of (5.1a). For this example graphs are constructed to be decompos- 
able and the clique marginals are specified as Clayton copulas. 
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Fig 16. Graph encoding the Markov structure of the model given by precision matrix (5.3). 




Fig 17. Graph encoding the Markov structure of the model given in (5.4). 
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Fig 18. Alpha complex corresponding to the vertex set in Table 7 and r — VO.075. 
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Fig 19. Graph encoding the Markov structure of the model given in (5.5). 
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